Estimating hydrogen absorption energy on different metal hydrides using Gaussian process regression approach

Hydrogen is a promising alternative energy source due to its significantly high energy density. Also, hydrogen can be transformed into electricity in energy systems such as fuel cells. The transition toward hydrogen-consuming applications requires a hydrogen storage method that comes with pack hydrogen with high density. Among diverse methods, absorbing hydrogen on host metal is applicable at room temperature and pressure, which does not provide any safety concerns. In this regard, AB2 metal hydride with potentially high hydrogen density is selected as an appropriate host. Machine learning techniques have been applied to establish a relationship on the effect of the chemical composition of these hosts on hydrogen storage. For this purpose, a data bank of 314 data point pairs was used. In this assessment, the different A-site and B-site elements were used as the input variables, while the hydrogen absorption energy resulted in the output. A robust Gaussian process regression (GPR) approach with four kernel functions is proposed to predict the hydrogen absorption energy based on the inputs. All the GPR models' performance was quite excellent; notably, GPR with Exponential kernel function showed the highest preciseness with R2, MRE, MSE, RMSE, and STD of 0.969, 2.291%, 3.909, 2.501, and 1.878, respectively. Additionally, the sensitivity of analysis indicated that ZR, Ti, and Cr are the most demining elements in this system.

Energy demand has increased exponentially in recent years, reaching over 18 TW. In the subsequent years, this market growth is expected to continue 1 . Nowadays, fossil fuels account for over 80% of global energy consumption [2][3][4][5] . However, due to the environmental issues, the transition to renewable energy sources is critical 6 . In this regard, hydrogen can revolutionize renewable energy systems as a fuel and a clean energy carrier. It could be the basis for establishing carbon-free fuels 7,8 . Hydrogen energy has been among the most popular energy sources in recent years. This is since it has a higher energy content and causes fewer environmental issues than fossil fuels 9 . Hydrogen has a far higher energy density of 142 Mj kg −1 than fossil fuels, with a density of 47 Mj kg −110 . It is estimated that about 35% of European vehicles will be hydrogen-powered by 2040 9 . In addition, hydrogen energy will provide around 34% of the world's energy demands by 2050 11 . Even though hydrogen is a prevalent element in nature, it is rarely found in pure form. As a result, several chemicals, electrochemical, photoelectrochemical, thermal, and microbiological approaches have been developed for producing it [12][13][14] . More than 50 million tons of hydrogen are produced annually in the world 15 .
Hydrogen may be stored in three primary ways, including gas, liquid, and solid-phase storage. Solid-phase storage is one of the most promising storage technologies owing to its ability to operate at room temperature and atmospheric pressure, as well as its excellent safety and low energy loss [16][17][18][19][20][21] . Metal hydrides have been noticed as a hydrogen storage material in solid-state conditions [22][23][24][25][26][27] and are produced by absorption of hydrogen molecules on a metallic/intermetallic host 28 . The gravimetric density of hydrogen absorbed in these compounds is about 1-3 wt% 5,29 . Different metal hydrides have been identified and examined so far, including AB, AB 2 , AB 3 , AB 5  www.nature.com/scientificreports/ A 2 B, in which A and B are two types of metals or a group of metals. The AB 2 metal hydride is the most promising type for hydrogen storage due to its easy activation, fast kinetics, and favorable pressure conditions 30 . In AB 2 alloys, element A contains hydride constituent of elements such as Ti, Zr, Ta, and Hf, while element B contains transition metals such as Fe, Co, Ni, Mn, Cr, and V 31,32 . The C14 and C15 with a hexagonal and face-center-cubic structure, respectively, are the laves phases of the AB 2 metal hydrides 33 . Thus, the element selection for A-site and B-site of AB 2 compounds influences their hydrogen storage performance. In order to investigate the effect of different elements or dopants on the hydrogen storage properties, traditional approaches, such as basic laws, computational modeling, and experimental investigations, are costly, time-consuming, and associated with numerous trials and errors, making them challenging and inefficient. Thus, to save time, energy, and cost, mainly when a complicated nonlinear relationship exists between the parameters and the performance, alternative machine learning (ML) techniques could be effective assessment methods.
The ML has become a prominent field of research and approach in developing and selecting advanced energy materials in recent years [34][35][36][37][38][39][40] . So far, various machine learning algorithms have employed hydrogen storage by metal hydride systems. For example, Griffin and Darsey estimated entropy, enthalpy, the temperature at 1 atm, pressure at 25 C, and the weight percent of hydrogen stored in metal hydrides using artificial neural networks. For the above parameters, the average correlation coefficient of R 2 was 0.8888, 0.9561, 0.9381, 0.9935, and 0.9569, respectively 41 . To estimate the hydrogen storage capacity in metal hydrides, Rahnama et al. utilized four models: linear regression, neural network, Bayesian linear regression, and boosted decision tree. The R 2 of the utilized models were 0.50, 0.60, 0.56, and 0.83, respectively, indicating that the boosted decision tree performed better than the other models 42 . In another study, Rahnama et al. classified metal hydrides using four classifiers: multiclass logistic regression, multiclass decision forest, multiclass decision jungle, and multiclass neural network. The accuracy of the used models was 0.47, 0.60, 0.62, and 0.80, respectively, indicating that the multiclass neural network classifier performed better than the other classifiers. This classification was based on the properties of metal hydrides, including the weight percentage of hydrogen, heat of formation, and operating temperature and pressure 43 . Suwarno et al. used their research to use multivariate regression, decision tree, and random forest models. The heat of formation, phase abundance, and hydrogen storage capacity of AB 2 metal hydrides were all estimated using these models. The random forest model showed the most outstanding performance among the three models, with an average R 2 value of 0.722 44 . Determining the pressure-composition-temperature (PCT) curve is an important issue in metal hydrides. This issue was considered in the research of Kim et al., where random forest (RF), K-nearest neighbor (KNN), and deep neural network (DNN) models were used. The deep neural network (DNN) model exhibited the greatest performance among the three models, with an average correlation coefficient R 2 of 0.9307 45 .
In the present study, for the first time, the Gaussian process regression (GPR) model with four kernel functions was used to estimate the energy of hydrogen absorption (ΔH) on the surface of the hydride alloys. The elements of A and B in AB 2 compounds were chosen as input variables to establish a relationship between the chemical composition of AB 2 and hydrogen storage properties. For this purpose, a substantial experimental data bank was applied. The developed model was evaluated by several error and statistical parameters. Also, sensitivity analysis was performed to find the most determining elements in the hydrogen storage on metal hydrides.

Methodology
Data collection. A set of 314 pairs of AB 2 alloys were collected and presented in the Supplementary Information from the literature 44 . They include the information of constituent elements and ΔH absorption (in KJ/ (molH 2 )).
It is worth mentioning that, in the pressure-composition-temperature diagram, some of the ΔH of these alloy couples are tacitly explained but are not clearly stated in the publications. The van't Hoff Law, as shown in the following equation, was used to calculate the aforementioned ΔH.
In order to determine the equilibrium pressure, the computation was done by choosing a midpoint from the plateau of the pressure-composition graph. The temperature value in the pressure-composition phase diagram is constant because R, the universal gas constant, is used in the calculation. The term S is assumed to have a constant value of − 110 kJ/(mol H 2 K).
As depicted in Fig. 1, 22 alloying elements of Si, Mo, Fe, C, Ni, Co, Zr, La, Cu, Gd, Al, Mn, Ti, Ce, W, B, Mg, V, Ho, Cr, Sn, and Nb are the input parameters while the ΔH is the output of the model to see the effect of each parameter on the hydrogen storage conditions. In this work, 70% of data was separated coincidentally as training data to develop the model, and the rest (30% data) was used as testing data for prediction to evaluate the model's accuracy. Several statistical factors were calculated to quantify the established model preciseness, including R 2 , standard deviation (STD), mean-square error (MSE), mean relative error (MRE), and root-mean-square error (RMSE). Consiering y and x as the predicted and experimental values respectively, these factors are defined as follows: www.nature.com/scientificreports/ Gaussian process regression. In comparison to support vector machines and artificial neural networks, Gaussian process regression with its super-parameters, which can be adaptively attained, is easy to perform. Also, the confidence interval (i.e., the uncertainty of the model prediction) can be obtained by this method 35,46 .
In GPR modeling L = x L·i · y L·i are arbitrarily selected training and testing data sets with input and output parameters of x and y, respectively. The modeling begins by: where ε , σ 2 noise , and I n are the observation noise, the variance of the noise, and the unit array. Similar to the traing data, we have for the test data: In GPR method, f(x) is a random function which defined by its corresponding covariance k(x, x′) (also called kernel) and mean m(x) functions.
Although m(x) can be obtained by applying explicit basis functions, for simplicity, it is usually supposed to zero 47 .
From Eqs. (7) and (11) the y is achieved as: Now, based on the introduced parameters: www.nature.com/scientificreports/ By summation of these two equations, the Gaussian expression is derived: To obtain the y T distribution, the conditioning rule of Gaussians can be used: With Σ T and μ T as the covariance and the mean value, respectively. The core of the GPR is the kernel function which generates a covariance matrix to calculate the "distance" between two data points. Thus, various kernel functions have different calculation approaches, affecting the strength and the robustness of the final GPR model 48 . In the present study, four kernel functions of Matern, Rational quadratic, Exponential, and Squared exponential are chosen to find the most appropriate one, defined as follows • Matern kernel function: • Rational quadratic kernel function: • Exponential kernel function: • Squared Exponential kernel function: In these equations, ℓ, σ, σ 2 , and α > 0 indicate the length scale, the amplitude, the variance, and scale-mixture, respectively. Also, v, K v , and Γ represent a positive parameter, the modified Bessel function, and the gamma function, respectively.
In the present study, we developed GPR models based on four kernel functions in MATLAB software version 2018 and compared their capabilities to estimate enthalpy of absorptions.
Data set outlier detection. Due to the existing errors in experiments or calculation methods, some of the collected data behave differently from other data points, known as suspected data or outliers. Having these data in the data bank leads to improper anticipation for the established models. Accordingly, the presence of the suspected data in the data bank should be investigated to advance the quality of the collected data bank. For this purpose, the Leverage method is used, which defines the Hat matrix and critical leverage limit as follows: where U is a matrix with the i*j dimension, and i and j are the number of parameters and the training data, respectively. To assess the quality of the collected data bank, William's plot concept is used, through which  Fig. 2, most data is in the reliable area. In detail, for all the developed GPR models, only 14 or 13 data points out of 314 data (about 4%) are out of the reliable zone, confirming the collected data set is appropriate for training and testing.

Results and discussion
Sensitivity analysis. In order to determine the effect of each element on the absorption enthalpy, an analysis of sensitivity is implemented. The relevancy factor, the metric which implies how much a parameter is effective, is derived from the following expression: where X k.i and Y i represent the 'k' th input and 'i' th output, while the average values of input and outputs are denoted by X k and Y , respectively. The input parameter with a larger r means a greater effect on the outcome. The positive sign indicates the parameter affects the output positively and vice versa for negative signs. According to the sensitivity analysis (Fig. 3), Ti and Zr are the most effective elements in the ΔH absorption of hydrogen, with the relevancy factor of − 38.47% and 38.38%, respectively. The opposite sign of these elements is because of their interchange in A site. In other words, when Ti increases, the amount of Zr automatically decreases and vice versa. This result was expected because, as discussed, element A (here are Ti and Zr) in AB 2 structures is the hydride forming element, significantly affecting the hydrogen adsorption energy of the alloy 49 . Among the rest presented to evaluate the developed model performance in the hydrogen absorption ΔH prediction. The statistical parameters are calculated and listed in 1 for the train, test, and overall dataset. In the training phase, the R 2 values of 0.976, 0.976, 0.95, and 0.966 were obtained for established GPR-Exponential, GPR-Matern, GPR-Squared Exponential, and GPR-Rational Quadratic models, respectively. Considering their low amount of MRE, MSE, RMSE, and STD, especially for the GPR-Exponential model, confirms that all the GPR models were trained with enough preciseness. They were used to predict new (testing) data to examine the robustness of the models. Based on Table 1, all the developed models showed their acceptable capability in the ΔH prediction. The GPR-Exponential is slightly more accurate among all models with R 2 = 0.969, MRE = 2.291%, MSE = 3.909, RMSE = 2.501, and STD = 1.878. www.nature.com/scientificreports/ The simultaneous comparison between the experimental and anticipated amounts of hydrogen absorption ΔH for all the models is illustrated in Fig. 4. It is clear that all the proposed GPR models are predicted in excellent agreement with the actual values of ΔH through which the prediction lines cover the data points accurately.
The cross plots for all the GPR models are depicted in Fig. 5. In these graphs, the bisector line of the first quarter is the accuracy merit; the closer data to this line, the more precise model is developed. As shown in   For more assessment of the GPR models' results, the relative deviation between the actual hydrogen absorption and the predicted ones is calculated and illustrated in Fig. 6. In each of the developed GPR models, most of the calculated absolute deviation data points are smaller than 10%. Also, the GPR model with Exponential kernel function has the minimum mean relative error of 2.291% compared to GPR-Matern (2.957%), GPR-Square exponential (4.416%), and GPR-Rational quadratic (3.929%).

Conclusion
In order to anticipate the hydrogen absorption ΔH on the AB 2 alloys, a machine learning approach of Gaussian process regression (GPR) with four different kernel functions (Exponential, Matern, Squared exponential, and Rational quadratic) was assessed. The 22 different alloying elements were used as the input. All the developed GPR models performed very well. Among them is the GPR-Exponential model with a little more excellence than others, with R2, MRE, MSE, RMSE, and STD of 0.969, 2.291%, and 3.909, 2.501, and 1.878, respectively, chosen as the best one. According to the sensitivity analysis, the Ti and Zr elements, along with V and Cr, contribute the most to the change of hydrogen absorption ΔH. The results of the presented work could provide the researchers and scientists with a perspective to choose the appropriate elements for AB 2 alloys for hydrogen storage.